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Abstract 

We describe the shrinking neighborhood approach of Robust Statistics, which applies to 
general smoothly parametrized models, especially, exponential families. Equal generality is 
achieved by object oriented implementation of the optimally robust estimators. We evaluate 
the estimates on real datasets from literature by means of our R packages ROptEst and RobLox. 
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1 Introduction 

Following Huber (1997), p 61, the purpose of robustness is "to safeguard against deviations from 
the assumptions, in particular against those that are near or below the limits of detectability". 
The infinitesimal approach of Huber-Carol (1970), Rieder (1978) and Rieder (1980), Bickel (1981), 
Rieder (1994) to robust testing and estimation, respectively, takes up this aim by employing shrink- 
ing neighborhoods of the parametric model, where the shrinking rate n -1 / 2 , as the sample size 
n — ► oo, may be deduced in a testing setup; confer Ruckdeschel (2006). 

It is true that Huber's own minimum Fisher information approach refers to (small) neighborhoods 
of fixed size; cf. Huber (1981). But it only treats variance, sets bias = by assuming symmetry, 
and is restricted to Tukey-type neighborhoods about location or scale models. It has not been ex- 
tended to simultaneous location and scale, let alone to more general models. Fraiman et al. (2001) 
derive MSE optimality on fixed size neighborhoods. In situations beyond one-dimensional location, 
however, they do not determine a solution in closed form either. The infinitesimal approach, on 
the contrary, provides closed-form robust solutions for general models (cf. Section 2.1) and fairly 
general risks based on variance and bias (cf. Ruckdeschel and Rieder (2004)). 

As noted by Huber (p 291 of Huber (1981)), in view of Theorem 3.7 of Rieder (1978), there is a 
close relation between the infinitesimal neighborhood approach and Hampel's Lemma 5 (cf. Hampel 
(1968)); see also Theorem 3.2 of Rieder (1980) and Theorem 5.5.7 of Rieder (1994). Differences to 
Hampel et al. (1986) nevertheless exist and concern: 

• definition of the influence curve, 

• necessity of the form of the optim ally robust influence curves, 
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• optimality criterion: MSE and even more general criterions, 

• determination of the bias bound (sensitivity) , 

• uniform asymptotics on neighborhoods, and 

• coverage of more models. 

A fourth robustness approach pursues efficiency in the ideal model subject to a high breakdown 
point; confer for example Maronna et al. (2006), Sections 5.6.3, 5.6.4 and 6.4.5. A high breakdown, 
though, may easily be incorporated in our approach: Given some starting estimator 8 n , we construct 
our optimal estimators S n as one-step estimates, 

S n = n + n _1 (^| n (xi) + • • • + tp§ n (x n )) (1) 

cf. Section 4. The procedure is called one-step re- weighting in Section 5.6.3 of Maronna et al. (2006) 
and has already been used in the Princeton robustness study (cf. Andrews et al. (1972)). Thus, if 
^(aOI ^ &i also \Sn ~ @n\ < b. Consequently, the breakdown point of the starting estimator 9 n is 
inherited to our estimator S n . Given the high breakdown, however, we do not consider robustness 
as settled, then striving just for high efficiency in the ideal model. Our primary aim stays minmax 
MSE on shrinking neighborhoods about the ideal model, which altogether complies with Huber 
(1997), p 61, that "a high breakdown point is nice to have if it comes for free". 
The organisation of the paper is as follows: We review the theory of asymptotic robustness on 
shrinking neighborhoods, add some recent results and spezialize. Then, we compute and apply the 
infinitesimal robust estimators to datasets from literature using our R packages ROptEst (general 
models) and RobLox (normal location and scale); confer R Development Core Team (2008), Kohl 
and Ruckdeschel (2008c) and Kohl (2008). Appplications of infinitesimal neighborhood robustness 
to time series will be the subject of another paper. 

2 Setup 

2.1 General Smoothly Parametrized Models 

Denoting by Mi (A) the set of all probability measures on some measurable space (fl,A), we 
consider a parametric model V = {Pg \ 9 € 6} C Mi(A), whose parameter space 6 is an open 
subset of some finite-dimensional R k , and which is dominated: dPg = pg dfi (9 € 6). At any fixed 
# G O, model V is required to be L 2 differentiable, that is, to have L 2 differentiable square root 
densities such that, in £ 2 (/i), as t — > 0, 

y/peTt =y/p~e{l + \t l k e ) + o{\t\) (2) 

The Revalued function Kg G L\{Pg) is called L 2 derivative, and its covariance2# = Eg AgA' g under 
Pg is the Fisher information of V at 9, required of full rank k. This type of differentiability is implied 
by continuous differentiability of pg and continuity Xg, with respect to 9, and then Kg = J| \ogpe- 
Confer e.g. Lemma A.3 of Hajek (1972), Section 1.8 of Witting (1985), Section 2.3 of Rieder (1994), 
Rieder and Ruckdeschel (2001). 

Our main applications in this article concern exponential families, in which case 

pe(x) = exp{((6yT(x) - 0(6)}h(x) (3) 
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with some measurable functions (: <d — ► S. k , h: CI — > [0, oo), T: fi — > K fc of positive definite 
covariance CovgT >- 0, and the normalizing constant (3(9). Then "P forms a fc-dimensional ex- 
ponential family of full rank. The natural parameter space consists of all C-values such that 
< J exp {('T(x)}h(x) n(dx) < oo. V is L 2 differentiable under the following assumptions: C 
continuously differentiable in 9 £ 9 with regular Jacobian matrix J^, and C Z° (interior). 

And then, 

Afl(a?) = Jc'C 7 ^") ^ E e T ) X e = Jc Cov e (T)J ( (4) 

where Eg denotes expectation under Pg. The result mentioned in van der Vaart (1998), Example 7.7, 
is proven in Kohl (2005), Lemma 2.3.6 (a). In what follows, the parametric model V is assumed L 2 
differentiable at any 9 £ Q. 

2.2 Asymptotically Linear Estimators 

The founders of robust statistics have defined influence curves (IC) as Gateaux derivatives of sta- 
tistical functionals; confer Section 2.5 of Huber (1981) and Section 2.1 of Hampel et al. (1986). 
The classical definition, however, remains vague. Even if such a derivative exists, the definition is 
not strong enough to cover the empirical; confer Reeds (1976) and Fernholz (1983). Our approach 
is different: Since most proofs of asymptotic normality in the i.i.d. case amount to an estimator 
expansion with the IC as summands, we define the set of all (square integrable, R fe -valued) ICs at 
Pg beforehand by 

#(0) = {4>8 £ L k 2 {Pg) I Eg i>g = 0, Eg ^gk'g = Ifc} (5) 

where I* denotes the kxk identity matrix. Then we define asymptotically linear (AL) estimators S 
to be any sequence of estimators S n : fi" — ► R k such that for some ipg £ ^(9), necessarily unique, 

n x/ \S n -9)=n- x /\Mxi) + ---+Mxn)) +o P? (n°) (6) 

where op™(n°) — > in product Pg probability as n — » oo. Thus, the originally intended interpreta- 
tion is achieved: ipe(%i) represents the asymptotic, suitably standardized influence of observation Xi 
on S n . The class of AL estimators as introduced by Rieder (1980), Definition 1.1 and Remarks, 
and Rieder (1994), Section 4.2, covers M, L, R, S and MD (minimum distance) estimates. 
By the Lindeberg-Levy CLT, as tpg £ L 2 (Pg), Eg ipg = 0, AL estimators are asymptotically normal 
under Pg, 

n 1 ' 2 {S n -9){P^) -v*Af(0,CoveWe)) (7) 

The third condition EgipgA' e = 1^ is equivalent to the locally uniform extension of (7), with 9 on 

the LHS replaced by 9 n with limsup„_, oc y/n \9 n — 9\ < oo. 

For the asymptotic variance under Pg, the Cramer- Rao bound holds, 

Cavetye) h lg l = Cov e (^ M ) , Ve e #e (8) 

with equality iff ipg = iph,6 '■= ^g 1 ^, the classical scores. 

2.3 Infinitesimal Perturbations 

The i.i.d. observations x\, . . . ,x n may now follow any law Q in some neighborhood about Pg. In this 
article , the type of neighborhoods in Rieder (1994) will be restricted to (convex) contamination 
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(* = c) and total variation (* = v). Delegating the total variation case to Appendix A, the 
system U c {9) thus consists of all contamination neighborhoods 

U c (6,s) = {{l-s)P e + sQ\QeMi{A)}, 0<s<l (9) 

Subsequently, s = s n = rn~ x l 2 for starting radius r £ [0, oo) and n — » oo. 

Remark 1. Under Q, still the parameter 9 has to be estimated. Since the equation Q — Pg + (Q — P$) 
involving the nuisance component Q — Pg, may have multiple solutions 6, the parameter 9 is no longer 
identifiable. This problem has been dealt with by estimating functionals that extend the parametrization 
to the neighborhoods. As noted in Section 4.3.3 of Rieder (1994), however, both approaches lead to the 
same optimally robust ICs and procedures once the choice of the functional is subjected to robustness 
criteria. 

We now fix 6 S 8 and introduce the bounded tangents at Pg, 

Z oo (9) = {q€L oo (P e )\E e q = 0} (10) 

Along any q £ Z^Q) and for starting radius r g [0, oo), simple perturbations are defined by 

dQ n (q,r) = (l + rn- 1/2 q)dP e (11) 

provided that n 1 / 2 > — rinfp 9 q, where infp e denotes the Pg-essential infimum. AL estimators, 
under such simple perturbations, are still asymptotically normal, 

n l / 2 {S n -0) (<£(?, r)) ^A4(rE 9 iM, Cov e (^)) (12) 

with bias rEgipgq. We have Q n (q,f) S U c (9,rn~ 1 ' 2 ) iff q £ G c {9) for the class 

Gc{0) = {q&Z oo (6)\ml Pe q>-l} (13) 

Confer Rieder (1994), proof to Proposition 4.3.6 and Lemma 5.3.1. 



3 Optimally Robust Influence Curves 
3.1 Maximum Risk 

Our aim is minmax risk. Employing a continuous loss function I: M. k — > [0, oo), the asymptotic 
maximum risk of any estimator sequence on contamination neighborhoods about Pg of size rn^ 1 / 2 
is 

lim lim sup / 'e M (n 1/2 (S n -0))dQ^ (14) 

M^oo n^oo Q 6C / c (0 irn -i/2) J 

where, for ease of attainability of the minimum risk, the truncated loss functions Im = min{M, £} 
are employed. A further simplified and smaller risk is obtained by a restriction to simple perturba- 
tions Q n = Qn(q,r) with q € Gc{9) and the interchange of sup gg g c (g), limM->oo, and linin^oo. 
The fixed 9 will be dropped from notation henceforth whenever feasible. Thus, for an AL estima- 
tor S = (S n ) with IC V> at P = Pg, and Z ~ N k (0, Cov(VO) , 

sup lim lim I £ M (n 1/2 (S n -6))dQ"(q,r)= sup El(rEi/>q + Z) (15) 
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For the square £(z) = \z\ 2 , the (maximum, asymptotic) MSE is obtained as weighted sum of the 
L<i- and Loo-norms of ip under P, 

MSE(0,r) = E\iP\ 2 + r 2 uu 2 c {iP) (16) 

since 

w c (il>) = sup{|E^g| \qe Gc{0)} =sup P |V| (17) 
the P-essential sup of \ip\; confer Sections 5.3.1 and 5.5.2 of Rieder (1994). 

Other (convex, monotone) combinations of bias and variance (e.g., L p -risks) have been considered 
in Ruckdeschel and Rieder (2004). 

A suitable construction achieves that, in case of the optimally robust estimator, risk (14) is not 
larger than the simplified risk (15); confer Section 4 below. 

3.2 Minmax Mean Square Error 

The optimally robust ip* , the unique solution to minimize MSE(-0, r) among all ip G is given in 
Theorem 5.5.7 of Rieder (1994): There exist some vector z G R k and matrix A G R kxk , ^1^0, 
such that 

ip* = A(A - z)w , w = min{l, b\A(A - z)^ 1 } (18) 

where 

r 2 6 = E(|,4(A-z)|-&) + (19) 

and 

= E{A- z)w, A^ 1 = E(A - z)(A — z)'w (20) 

Conversely, form (18)-(20) suffices for ip* to be the solution. 
The proof uses the Lagrange multipliers supplied by Rieder (1994), Appendix B. 
The minmax solution to the more general risks considered in Ruckdeschel and Rieder (2004) also is 
a MSE solution with suitably transformed bias weight; confer their Theorem 4.1 and equation (4.7). 
The matrix A, in case r = 0, equals inverse Fisher information I -1 , which appears in the Cramer- 
Rao bound (8). In general, A is defined by (19) and (20) only implicitly. It is surprising that the 
statistical interpretation in terms of minimum risk obtains in the extension, with bias now involved. 

Theorem 1. For any r G (0, oo) and ip G ^ we have 

MSE(V>,r) > trA = MSE(V>V) (21) 
where equality holds in the first place iff ijj = -0* defined by (18)-(20). 

3.3 Relative MSE 

The starting radius r for the neighborhoods U c (9, rn -1 / 2 ), on which the minmax MSE solution ip* — 
4>* depends, will often be unknown or only known to belong to some interval [ri 0l r up ) C [0,oo). In 
this situation that ip* is used when in fact %p*. is optimal, we introduce the relative MSE of ip* at 
radius r, 

relMSE(0*,r) = MSE(-0*, r)j MSE(V>*, r) (22) 
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For any radius s 6 [ri ,r up ) the sup r relMSE^* , r) is attained at the boundary, 

sup relMSE(^*, r) = relMSE(V£, r lo ) V relMSE(^*, r up ) (23) 

re[r lo ,r up ) 

A least favorable radius ro is defined by achieving inf^ of sup r relMSE(^J,r), that is, 

inf sup relMSE(V>*,r) = sup relMSE(V>* ,r) (24) 

sG[ri ,r up ) re[r lo ,r up ) re[r lo ,r„ p ) 

and is characterized by relMSE(-0* o , r\ ) = relMSE(?/>* o , r up ). 

The IC ip*, respectively the AL estimator with this IC, are called radius- minmax (rmx) and 
recommended. 

Confer Kohl (2005), in particular Lemma 2.2.3, and Rieder et al. (2008). 

The recommendation is in some sense independent of the loss function: In case of unspecified radius 
(i.e., rio = 0, 7\ lp = oo), the rmx IC is the same for a variety of loss functions satisfying a weak 
homogeneity condition; confer Ruckdeschel and Rieder (2004), Theorem 6.1. 

3.4 Cniper Contamination 

The notion is suited to demonstrate how relatively small outliers suffice to destroy the superiority 
of the classical procedure. Employing, for this purpose, contaminations R n := (1 — rn~ 1 / 2 )P + 
rn^ 1 / 2 ligx by Dirac measures in « £ K, the asymptotic MSE of the classically optimal estimator 
(i.e., with IC ipu = X _1 A) under R n is MSE a (V>h, r) := trJ -1 + r 2 \iph{a)\ 2 . Relating this quantity 
to the minmax MSE = tr A (Theorem 1), we are interested in the set C of values a € R such that 
MSE a (ip h ,r) > MSE(^*,r); that is, 

r 2 \ip h (a)\ 2 > tr A - tr J -1 (25) 

In all models we have considered so far, rather small values a suffice to fulfill (25). In a Janus type 
pun on the words "nice" and "pernicious", the boundary values of C are called cniper points (acting 
like a sniper); confer Ruckdeschel (2004) and Kohl (2005), Introduction. 

4 Estimator Construction 

Given the optimally robust IC ipg, one for each 9 £ Q, the problem is to construct an estimator S* = 
(S*) that is AL at each 6 with IC ipg. In addition, the construction should achieve that there is no 
increase from the simplified risk (15) to the asymptotic maximum MSE (14). 

We require initial estimators a = (a n ) which are n 1 / 2 consistent on the full neighborhood system 
U c (9)\ that is, for each r £ [0,oo), 

lim limsupsup {Q^\n 1/2 \a n - 0\ > M) I Q na € U c {6, rn" 1 ' 2 ) } = (26) 

with = Q n ,i <8> • • • ® Qn,n- For technical reasons, the a n are in addition discretized in a suitable 
sense (cf. Rieder (1994), Section 6.4.2). 

In this article, the optimally robust ICs ipg are bounded. Thus conditions (2)-(6) of Rieder (1994), 
p 247, on (ipg)g(zQ simplify drastically; namley, to continuity in sup-norm, 



lim supsgn \ip*( x ) ~ ^e( x )\ = 



(27) 



Then, according to Rieder (1994), Theorem 6.4.8 (b), the one-step estimator S, 



S n = a n + n' 1 (^* n (an) + ■ ■ ■ + il>* n (x„)) (28) 

where a„ = & n {%x, ■ ■ ■ , %n), is uniformly asymptotically normal such that, for all arrays Q n ,i G 
U c (9, rn -1 / 2 ) and each r G (0, oo), 

7i 1/2 (5„ - 8 - B„)(QW) TV (0, Cov fl (^)) (29) 

with B n = n^ 1 [J tpg dQnS + • • • + J ipg dQ n ,n) ■ Employing a version ipg of form (18)-(20) which is 
bounded pointwise by b = bg, we obtain 

\B n \<sup xen \r e (x)\=b e (30) 

Thus (29) ensures that risk (14) is not larger than the simplified risk (15). 

Remark 2. As initial estimators we prefer MD estimates, not primarily because of their breakdown point 
but because of their related tail behavior (cf. Ruckdeschel (2008a)) and their applicability in general models. 
In particular, both Kolmogorov and Cramer-von Mises MD (CvM) estimates may be employed (cf. Rieder 
(1994), Theorems 6.3.7 and 6.3.8), with an advantage of the latter — in view of the larger neighborhoods, to 
which its n 1//2 consistency extends, and the variance instability, for finite n, of the former (cf. Donoho and 
Liu (1988)). In particular models, other estimators may qualify as starting estimators and may even be 
preferable for computational reasons; e.g.; median, MAD in one-dim location and scale, minimum covariance 
determinant estimator in multivariate scale, least median of squares, and S estimates in linear regression; 
confer Rousseeuw and Leroy (1987) and Yohai (1987). 

Remark 3. Under additional smoothness, according to Ruckdeschel (2008a) and Ruckdeschel (2008b), 
assumption (26) of n 1//2 consistency may be weakened to only n 1 ^ 4+s consistency, for some S > 0. Conse- 
quently, for example, the least median of squares estimator may be employed as a high breakdown starting 
estimator. Ruckdeschel (2008b) gives other, partly more, partly less stringent conditions. Moreover, Ruck- 
deschel (2008a) ensures uniform integrability so as to dispense with the truncation of unbounded loss 
functions in (14). 

The remainder of the section deals with condition (27). We assume that the Lagrange multipliers 
Ag and ag := Agzg in (18)-(20) are unique, and, as t — > 8, 

Ar(Pr) Ae{Pe) , trX T — ► trig (31) 

sup |A T (z) - A e (x)\ + snp ^Mf-^MI _^ (32 ) 
xev c ie=p c \AgAg(x) - ag\ 

where V c = { x e | \A t A t (x) - a t \ < b t for t = t or t = 8 }. Then, by Kohl (2005), Theorem 2.3.3, 
condition (27) is fulfilled. 

For example, in case of a location and scale with location parameter j3 € K and scale parameter 
a G (0,oo), we have Ag(x) = <r~ 1 Ag ((x — (3)/<j), hence Ag(Pg) = <r~ 1 Ag (Pg ) and Ig = a~ 2 lg , 
where 8 = (/3, a)' and 8q = (0, 1)'. Therefore, (31) is fulfilled. Condition (32) needs further checking 
but seems plausible as Ag a is continuous (if the model is to be L2 differentiable) . 
In the case of an L2 differentiable exponential family, in view of (4), condition (31) is satisfied, 
while (32) holds according to Kohl (2005), Lemma 2.3.6. 
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5 Applications 



5.1 Proposal 

Based on the presented results we make the following proposal for applications: 
Step 1: Decide on the ideal model. 

Step 2: Decide on the type of neighborhood (* = c or * = v). 

Step 3: Determine lower and upper bounds s\ a , s up for the size s = s n of the neighborhoods 
U*(8,s) to be taken into account. 

Step 4: Put n = n 1 / 2 si Q , r up = n 1 / 2 s up , and compute the rmx IC for [n ,r up ]. 

Step 5: Evaluate an appropriate starting estimator. 

Step 6: Determine the rmx estimator using the one-step construction. 

Our R packages RobLox (cf. Kohl (2008)) and ROptEst (cf. Kohl and Ruckdeschel (2008c)) provide 
an easy way to perform steps 4-6 making use of our packages distr (cf. Ruckeschel et al. (2006)), 
distrEx (cf. Ruckeschel et al. (2006)), distrMod (cf. Ruckdeschel et al. (2008)), RandVar (cf. Kohl 
and Ruckdeschel (2008a)) and RobAStBase (cf. Kohl and Ruckdeschel (2008b)). 
The implementation of these packages heavily relies on S4 classes and methods; confer Chamber 
(1998). Based on this object orientated approach package ROptEst provides an implemenation 
that (so far) works for all(!) L 2 differentiable parametric models which are based on a univariate 
distribution. 

In the sequel, we will demonstrate the use of packages RobLox and ROptEst by application to some 
datasets from literature. 

5.2 Normal Location and Scale 

We consider the following 24 measurements (in parts per million) of copper in wholemeal flour 
(cf. Analytical Methods Committee (1989)) 

2.20 2.20 2.40 2.40 2.50 2.70 2.80 2.90 
3.03 3.03 3.10 3.37 3.40 3.40 3.40 3.50 
3.60 3.70 3.70 3.70 3.70 3.77 5.28 28.95 

where the value 28.95 is clearly conspicuous. In agreement with Maronna et al. (2006), Section 2.1, 
in view of the majority of the data, we assume normal location and scale as the ideal model, 
Pe = N{[i,o 2 ) with 9 = (//, er)', ji G M, a G (0,oo). Let us stick to contamination neighborhoods 
(* = c). We assume that roughly 1-5 observations, that is, roughly 5-20% of the 24 observations 
are erroneous. Then the matrix A and centering vector a = Az in (18)-(20), by absolute continuity 
of the normal distribution, are unique. Since normal location and scale also is an L2 differentiable 
exponential family, the assumptions for our estimator construction are fulfilled. We choose the 
Cramer-von Mises MD estimator (CvM) as initial estimator. 

The following R code shows how function roptest of package ROptEst can be applied to perform 
the computations, where x represents the data, 

R > roptest(x = x, L2Fam = NormLocationScaleFamily () , 

neighbor = ContNeighborhoodQ , eps. lower = 0.05, 
eps. upper = 0.20, distance = CvMDist) 
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Table 1: Normal location and scale estimates 



Estimator 


A 




mean & sd 


4.28 


5.30 


median & MAD 


3.39 


0.53 


Huber M (Proposal 2) 


3.21 


0.67 


Yohai MM 


3.16 


0.66 


CvM 


3.23 


0.67 


rmx (roptest) 


3.16 


0.66 


rmx (roblox) 


3.23 


0.64 



Location part Scale part 



o 




Figure 1: rmx IC computed via roblox. 



More specified to the normal ideal model is the function roblox of package RobLox, which only 
works for, and is optimized for speed in, normal location and scale. It uses median and MAD as 
starting estimates which is justified by Kohl (2005), Section 2.3.4. 

R > roblox (x = x, eps. lower = 0.05, eps. upper = 0.20) 

Table 1 shows the results of these computations as well as mean, standard deviation and some well- 
known robust estimators. The robust estimators median & MAD - rmx (roblox) yield very similar 
results, while, obviously, mean and standard deviation represent the data badly. Figure 1 shows 
the location and scale parts of the rmx IC computed via function roblox. The location part of the 
rmx IC, as of any optimally robust IC, is redescending. Thus, redescending in our setup follows 
on optimality grounds. For another derivation of redescending M-estimators see Shevlyakov et al. 
(2008). 

Based on these robust estimates, let us assume a mean of \x = 3.2 and a standard deviation of 
a = 0.7 for the ideal distribution Pq = A/"(3.2, 0.7 2 ). For a contamination of s n — 10% at a 
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Length of stays 




Length of stay 

Figure 2: Observed frequencies and fitted Gamma densities. 



sample size of n = 24 (i.e., r w 0.49), the cniper points are calculated to 1.86 and 4.54, and 
C = (—oo,1.86] U [4.54,oo). Under any element of U c (0,s n ) the probability of C is 5-15%, where 
Pe(C) = 5.56%. 

5.3 Gamma Model 

We analyze the length of stays of 201 patients in the University Hospital of Lausanne during 
the year 2000 (cf. Hubert and Vandervieren (2006)). Following Marrazi et al. (1998), we use the 
Gamma model pe(x) = r(a)~ 1 a~ a x a ~ 1 e~ x l a with shape and scale parameters a, a <G (0, 00) and 
6 = {a, a)'. By Kohl (2005), Section 6.1, this exponential family is L2 differentiable. We assume 
contamination neighborhoods (* = c) but, on visual inspection of the data, of only small size 
0.5% < s n < 5%. Then, due to absolute continuity of P = Pe, equations (18)-(20) yield unique 
solutions A and a — Az. Thus, the one-step construction of the rmx estimator, based on the 
CvM estimate, applies. The algorithm can be performed by applying function roptest of package 
ROptEst, where x contains the data, 

R > roptest(x = x, L2Fam = GammaFamily () , 

neighbor = ContNeighborhoodQ , eps. lower = 0.005, 
eps. upper = 0.05, distance = CvMDist) 

a call, which is very similar to the one in the previous example. In fact, the unified call for roptest 
applies to any smooth model. Figure 2 compares the densities of the estimated Gamma distributions 
with the histogram of the data. Table 2 shows the results as well as the MLE and the CvM. Again, 
the MLE is strongly affected by a few very large observations whereas the robust estimators stay 
closer to the bulk of the data. Figure 3 shows scale and shape parts of the rmx IC (similarly, of 
any optimally robust IC; confer Kohl (2005), Figure 6.1). 
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Table 2: Gamma scale and shape estimates 





Estimator 


MLE 


CvM 


rmx 


a 


7.00 
1.61 


6.53 
1.54 


4.97 
1.86 




Assuming the ideal Gamma distribution Pg with 9 = (5.0, 1.9)' and a contamination size s n = 2.5% 
at n = 201 (i.e., r « 0.35), the cniper points are 0.62 and 29.31, and C = (-oo,0.62] U [29.31, oo). 
Under any element of U c (9,s n ) the probability of C is 2.5-5%, where Pg{C) = 2.63%. 



5.4 Poisson Model 

For the decay counts of polonium recorded by Rutherford and Geiger (1910), 

counts 1 2 3 4 5 6 7 8 9 10 11 12 13 14 
frequency 57 203 383 525 532 408 273 139 45 27 10 4 1 1 

we assume the Poisson model pg(x) = e~ e 6 x /x\, which exponential family is differentiable in 
the paramter 9 <E (0, oo) (cf. Kohl (2005), Section 4.1). 

For both contamination (* = c) and total variation neighborhoods (* = v) of size 0.01 < s„ < 0.05 
we compute the rmx estimator. But, in case * = c, a = Az may be non-unique, which happens 
if medp(A), the median of A = Ag under P = Pg, is non-unique and r = n x l 2 s n is > the so called 
lower case radius f (cf. Kohl (2005), Section 2.1.2). The non-uniqueness of the median occurs for 
only countably many values 9. Since, as our numerical evaluations show, already small deviations 
(~ ±10 -8 ) from the exceptional values lead to a unique a, non-uniqueness may be neglected in 
practice; confer Kohl (2005), Sections 4.2.1 and 4.4. In case * = v, the one-step construction 
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Table 3: Poisson mean estimates 





Estimator 


MLE 


CvM 


rmx (* = c) 


rmx (* = v) 


e 


3.8715 


3.8953 


3.9131 


3.9133 



Decay counts of polonium 
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o 
o 
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o 
o 

CM 




observed 
MLE 

rmx (* = c,v) 



i . i i .ii 



10 



12 



14 



count 



Figure 4: Observed and fitted frequencies. 



applies without restrictions; confer Appendix A. Then, using the CvM as starting estimator, the 
rmx estimators are obtained via the following calls to function roptest of package ROptEst, where 
x contains the data, 

R > roptest(x = x, L2Fam = PoisFamilyQ , 

neighbor = *, eps. lower = 0.01, 

eps. upper = 0.05, distance = CvMDist) 

where * stands for ContNeighborhoodO or TotalVarNeighborhoodO , respectively. The results as 
well as MLE and CvM estimate are given in Table 3. The estimates differ only slightly, as the data, 
in view of the observed and fitted frequencies in Figure 4, appears in very good agreement with the 
Poisson model. Figure 5 shows the rmx ICs for contamination and total variation neighborhoods. 
In fact, any optimally robust IC is of similar form (cf. Kohl (2005), Figures 4.1 (* = c) and 4.14 

(*=«))■ 

Remark 4. ICs are defined with respect to the ideal model, thus, in case of the Poisson model, on No. If 
we want to allow distributions in the neighborhoods whose supports are more generally in [0, oo), we only 
need to extend ip* from No to [0, oo) such that |^>*(a;)| < b for each x > 0; confer (30) in the estimator 
construction. 

Assuming the ideal Poisson distribution Pg with 6 = 3.9, neighborhood type * = c and a contam- 
ination size s n = 3% at n = 2608 (i.e., r ~ 1.53), we get the cniper points 1.26 and 6.54, and 
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contamination (* = c) 



total variation (* = v) 



o o 



O o 



~l i i i i i r 

2 4 6 8 10 12 14 



-•-•-»-•--•--•-«- 



"i i i i r 



2 4 6 8 10 12 14 

x x 
Figure 5: rmx IC computed via roptest for * = c, v. 



C = [0, 1.26] U [6.54, oo). Under any element of U c (9,s n ) the probability of C is 19.5-22.5%, where 

Pg(C) = 20.0%. 



A Total variation neighborhoods (* = v) 

The system U v {9) consist of the closed balls of radius s about Pg, in the total variation metric 
d v (Q,P e ) = sup AeA \Q(A) - P e (A)\, 

U v (6,s) = {QeMi(A)\d v (Q,P e )<s}, 0<s<l (33) 

which have the following representation in terms of contamination neighborhoods, 

U v {9, s)-Pg= (U c {6, s) - Pg) - (U c (6, s) - P e ) (34) 

In particular, U c (9,s) C U v (9,s) follows. In our asymptotics, s = s n = rwT 1 ! 2 for some r e [0,oo), 
as the sample size n — * oo. Corresponding simple perturbations Q n (q, r) are defined by (10) and (11) 
with tangents q in the class 

g v (ff) = {qe Z^e) | Eg \q\ < 2} = C7 C (0) - Q c {6) (35) 

We fix 9 and drop it from notation. Then, with sup e extending over all unit vectors e in M. k , the 
standardized (infinitesimal) bias term of an IC ip € ^ is 

Lu v (ip) =sup{|Ei/'g| | q& Q v {9)} = sup e (sup P e't/j - inf P e'tp ) (36) 

The exact bias term in case k > 1 is difficult to handle and has been dealt with only in exceptional 
cases (cf. Rieder (1994), p 205 and Theorem 7.4.17). The obvious bound oj c {ip) < w„(V>) < 2w c (^) 
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suggests an approximate solution by a reduction to the contamination case * = c and radius 2r. 
An exact solution of the MSE problem with bias term oj v is still possible in dimension k = 1, in 
which case (jJ v (i/j) = sup P — infp V>- In case k = 1, the optimally robust IC -0*, the unique solution 
to minimize MSE(^,r) = E-0 2 + r 2 w 2 (-0) among all ICs V € * is provided by Rieder (1994), 
Theorem 5.5.7: For some numbers c, 6, A, 

■0* = c V AA A (c + 6) (37) 

where 

r 2 b = E(c-AA)+ = E(AA-(c+b)) + (38) 

and 

E(cV AAA (c + 6))A= 1 (39) 
Conversely, form (37)-(39) suffices for ^/>* to be the solution. 

The solutions A, b and c of equations (37)-(39) are always unique, as discussed in Section B.l 
below. Moreover, the condition that, as r — > 0, 

sup |A T (z) - A e (x)| + sup IMz)-A*(x)| Q (4Q) 

xeV v xe a T> z , \Ag(X)\ 

where T) v = {x £ f2 | Ct < A t A t (x) < b t + c t for t = r or t = 9 }, has been verified by Kohl (2005), 
Lemma 2.3.6, in the case * = v, k = 1, for L 2 differentiable exponential families. Thus, the one-step 
construction is valid. 



B Auxiliary Results And One Proof 

B.l Boundedness, Uniqueness, Continuity Of Lagrange Multipliers 

We discuss boundedness, uniqueness, and continuity of the Lagrange multipliers A, a = Az, b and c 
in the optimally robust IC ip* . These properties are, on one hand, reassuring for the convergence of 
our numerical algorithms. On the other hand, they imply the continuity in sup-norm (27) required 
for the construction. 

Boundedness Given r > 0, bounds for the solutions A, a = Az, b and c of (18)-(20) and (37)-(39), 
respectively, are derived in Kohl (2005), Section 2.1.3. For example, \a\ < r 2 b holds. 

Uniqueness The Lagrange multipliers (like the separating hyperplanes) need not be unique; con- 
fer Rieder (1994), Remark B.2.10 (a). But, at least, tiA, b, and c in (18)-(20) and ( 37 H 39 )> 
respectively, are unique since, in terms of the unique ip*, 

trA = MSE(V>V), b = uj*{ip*) 1 c = inf P 0* (41) 

If k = 1 and medp(A) is unique, then a is unique; Rieder (1994), Lemma C.2.4. In case k = 1 and 
mcdp(A) is non-unique, then a is unique for r < f (the so called lower case radius); confer Kohl 
(2005), Proposition 2.1.3. 

In case * = c, k > 1, uniqueness of A and a is ensured by the assumption that 

support A(P) =M fc (42) 
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confer Rieder (1994), Remark 5.5.8. A and a are unique also under the more implicit condition 
that, for any hyperplane H C M fc , 

P(A e H) < P{\ip*\ < b) (43) 
which certainly is satisfied if P(A & H) = for any hyperplane H; that is, 

eel 1 , a G R , P(e'A = a) > =^ e = (44) 
confer Rieder (1994), Section 5.5.3. Both (42) and (44) imply that J ^ 0. 

Continuity in 0: Denote by V$ the MSE solution to variable parameter 9 G 8 and fixed radius 
r G (0,oo). Then, under assumption (31), we obtain 

tr A,- — > trAe , & T — > &e , c T — > eg (45) 

as t — > 0. Provided that and a# are unique, moreover 

A T — ► A e , a T — ► a g (46) 

Confer Kohl (2005), Theorem 2.1.11. 

Continuity in r: Continuity in r is needed for the rmx estimator. Denoting by A r , a r = A r z r , 

b r , and c r the solutions of (18)-(20) and (37)-(39), respectively, for fixed 9 and variable r G (0, oo), 
Kohl (2005), Proposition 2.1.9, says that 

tr A s — > tr A r , b s — > b r , c s — > c r (47) 

as s — > r. Moreover, in case that A r and a r are unique, 

— > A r , a s — ► a r (48) 

For the rmx estimator, in addition some monotonicity in r is needed and supplied by Ruckdeschel 
and Rieder (2004), Kohl (2005), and Rieder et al. (2008). 

B.2 Proof of Theorem 1 

minmaxMSE = E|?/| 2 + r 2 b 2 = -Er)'(Y - rj) + E-q'Y + r 2 b 2 with the abbreviations 77 := tp* , 
Y := AA, where F<r)'Y = trE^F' = trA' = trA since E?7A' = l k . 

* = c: In this case, 77 ^ Y iff \Y\ > b, and thus E<q'(Y - rj) = bE(\Y\ - b)+ = r 2 b. 

* = v , k = 1: In this case, Erj(Y - rj) = 6E(c — Y) + = r 2 b 2 . 
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